-
Notifications
You must be signed in to change notification settings - Fork 2.5k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement the rootless-cni-infra container imageless #8910
Conversation
[APPROVALNOTIFIER] This PR is APPROVED This pull-request has been approved by: Luap99 The full list of commands accepted by this bot can be found here. The pull request process is described here
Needs approval from an approver in each of these files:
Approvers can indicate their approval by writing |
8bde21a
to
e26ffb8
Compare
@rhatdan I need your selinux knowledge here. On fedora the rootless-cni-infra script is failing with the following error:
This access is blocked by selinux. I see this AVC:
The rooless-cni-infra container is created with selinux disabled so I don't understand why selinux is blocking access. On the host the file has the following label:
Interestingly enough it works on my test VM which has the fedora server spin installed and not workstation. |
The issue is the iptables command is attempting to create content inside of a directory labeled container_file_t. Iptables command is confined, and not allowed to write container content. |
Is the /run within the container? IE This is not the /run on the host? |
run := spec.Mount{ | ||
Destination: "/run", | ||
Type: "tmpfs", | ||
Source: "none", | ||
Options: []string{"rw", "nosuid", "nodev"}, | ||
} | ||
g.AddMount(run) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
/run
is mounted as tmpfs into the container
libpod/rootless_cni_linux.go
Outdated
WithCtrNamespace(rootlessCNIInfraContainerNamespace), | ||
WithName(rootlessCNIInfraContainerName), | ||
WithPrivileged(true), | ||
WithSecLabels([]string{"disable"}), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
selinux should be disabled for this container so I am not sure why selinux is blocking the access to this file
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The problem is the iptables command is confined. The question is who launched it? And why did it transition.
Hopefully I will have time early next week to attempt this and figure out what is going on. |
e26ffb8
to
6ee7da0
Compare
return err | ||
} | ||
// bind mount the rootfs in the userns read only | ||
if err := mount.Mount("/", rootfs, "bind", "rbind,rprivate,ro"); err != nil { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It should be noted in comment lines that "rbind,ro" is just a "best effort" to make the tree read-only, as "ro" is not recursive.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Good point. Is it possible to mount everything as readonly?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Probably possible with fuse-overlayfs
. We could also parse /proc/self/mountinfo
and bind-mount each of the entries with ro
, but that seems too complicated.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Kernel patch for recursive read-only is here (unmerged): https://lore.kernel.org/linux-fsdevel/[email protected]/T/#u
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be great to get that in.
16ed86f
to
468f86c
Compare
468f86c
to
bc1f8ce
Compare
bc1f8ce
to
5fbe9b6
Compare
I reworked this to support live migration from a previous version. If the old infra container is still running podman talks via the old api to the script. Once the old infra container is deleted the new imageless infra container will be created and podman can talk via the new api. A version label is added to the container to distinguish between old and new. |
5fbe9b6
to
3571c2f
Compare
@rhatdan Have you looked at the selinux issue? For now |
@rhatdan # selinux with container image
$ podman run alpine cat /proc/self/attr/current
system_u:system_r:container_t:s0:c280,c796
$ podman run --security-opt label=disable alpine cat /proc/self/attr/current
unconfined_u:system_r:spc_t:s0
# selinux with rootfs
$ mkdir -p ~/rootfs
$ podman unshare mount --rbind -r / ~/rootfs/
$ podman unshare mount -t tmpfs none ~/rootfs/run
$ podman run --rootfs ~/rootfs cat /proc/self/attr/current
system_u:system_r:container_t:s0:c22,c289
$ podman run --security-opt label=disable --rootfs ~/rootfs cat /proc/self/attr/current
unconfined_u:system_r:container_runtime_t:s0
# why container_runtime_t and not spc_t?
# set selinux labels manually
# spc_t fails
$ podman run --security-opt "label=user:unconfined_u" --security-opt "label=role:system_r" --security-opt "label=type:spc_t" --security-opt "label=level:s0" --rootfs ~/rootfs cat /proc/self/attr/current
{"msg":"exec container process `/usr/bin/cat`: Permission denied","level":"error","time":"2021-02-14T16:08:22.000844579Z"}
# unconfined_t works
$ podman run --security-opt "label=user:unconfined_u" --security-opt "label=role:system_r" --security-opt "label=type:unconfined_t" --security-opt "label=level:s0" --rootfs ~/rootfs cat /proc/self/attr/current
unconfined_u:system_r:unconfined_t:s0 I don't know if this is a bug or expected. Either way when I set unconfined_u:system_r:unconfined_t for the cni container it works. |
9f105e2
to
19d65ff
Compare
Do you have AVC Messages? |
ls -lZd ~/rootfs/ There are several issues here. I am thinking that we could grab the "level" label off of the rootfs and set that to run the container with so that we run as container_t, that might be the best solution for this. Check to make sure that rootfs is labeled container_file_t:MCS and then use the MCS. |
The reason you are seeing the difference on spc_t versus container_runtime_t is based on the label of ~/rootfs. The reason spc_t is failing, is that there is an entrypoint fule on spc_t, So only certain labels are allowed to "transition" to spc_t. The label of the cat command is probably not allowed as an entrypoint to spc_t. Similar we have transition rules that say when podman running as container_runtime_t runs certain executables it will |
19d65ff
to
ec8b78b
Compare
133d8d6
to
ecebe3f
Compare
As proposed by Akihiro Suda make the rootless-cni-infra container use the host rootfs instead of an image. This works by mounting the host rootfs in the user namespace to `$runroot/rootless-cni-infra` and use this as rootfs for the container. Second, rewrite the rootless-cni-infra shell script in go to remove the extra cnitool dependency which is not packaged anywhere. With that we only need the same dependencies as rootful podman which should be already installed. Advantages: - Works for all architectures podman supports. - Works without internet connection. - No extra maintainence of an extra image. Disadvantages: - Requires the dependencies to be available on the host (e.g. dnsname plugin). The user may not have control over those. Problems: - It doesn't unmount the rootfs if the the rootless-cni-infra container is stopped directly. Also the image version did not respect the `--cni-config-dir` option properly. It mounted the cni config dir only at container create time but this option can be used on podman run commands which did not worked if the rootless-cni-infra container was already running. This is only possible with the rootfs version. Live upgrading is possible. If the old infra container is still running podman talks via the old api to the script. Once the old infra container is deleted the new imageless infra container will be created and podman can talk via the new api. A version label is added to the container to distinguish between old and new. Signed-off-by: Paul Holzinger <[email protected]>
ecebe3f
to
73393fc
Compare
OK I don't think I can get this to work. There is a locking issue somewhere. While thinking about it I thought about not using a container at all and just create a netns in the podman user namespace. I have opened #9423 to implement this instead. I think this is better in the long run. |
agree |
As proposed by Akihiro Suda make the rootless-cni-infra container use
the host rootfs instead of an image. This works by mounting the host
rootfs in the user namespace to
$runroot/rootless-cni-infra
and use this as rootfs for the container.
Second, rewrite the rootless-cni-infra shell script in go to remove the
extra cnitool dependency which is not packaged anywhere. With that we
only need the same dependencies as rootful podman which should be
already installed.
Advantages:
Disadvantages:
plugin). The user may not have control over those.
Problems:
is stopped directly.
Also the image version did not respect the
--cni-config-dir
optionproperly. It mounted the cni config dir only at container create time
but this option can be used on podman run commands which did not
worked if the rootless-cni-infra container was already running.
This is only possible with the rootfs version.
Live upgrading is possible. If the old infra container is still
running podman talks via the old api to the script. Once the
old infra container is deleted the new imageless infra container
will be created and podman can talk via the new api. A version
label is added to the container to distinguish between old and new.
Fixes #8709
Signed-off-by: Paul Holzinger [email protected]